157 research outputs found

    The Wavelet Trie: Maintaining an Indexed Sequence of Strings in Compressed Space

    Full text link
    An indexed sequence of strings is a data structure for storing a string sequence that supports random access, searching, range counting and analytics operations, both for exact matches and prefix search. String sequences lie at the core of column-oriented databases, log processing, and other storage and query tasks. In these applications each string can appear several times and the order of the strings in the sequence is relevant. The prefix structure of the strings is relevant as well: common prefixes are sought in strings to extract interesting features from the sequence. Moreover, space-efficiency is highly desirable as it translates directly into higher performance, since more data can fit in fast memory. We introduce and study the problem of compressed indexed sequence of strings, representing indexed sequences of strings in nearly-optimal compressed space, both in the static and dynamic settings, while preserving provably good performance for the supported operations. We present a new data structure for this problem, the Wavelet Trie, which combines the classical Patricia Trie with the Wavelet Tree, a succinct data structure for storing a compressed sequence. The resulting Wavelet Trie smoothly adapts to a sequence of strings that changes over time. It improves on the state-of-the-art compressed data structures by supporting a dynamic alphabet (i.e. the set of distinct strings) and prefix queries, both crucial requirements in the aforementioned applications, and on traditional indexes by reducing space occupancy to close to the entropy of the sequence

    Spectral approximation algorithms for graph cut problems

    Get PDF
    Il problema del massimo taglio in un grafo (MaxCUT) consiste nel trovare una partizione dei nodi in due parti tali che gli archi con un estremo nella partizione e un estremo nell'altra siano massimizzati. È uno dei problemi combinatoriali su grafi più naturali e interessanti. Sfortunatamente come molti dei problemi interessanti è NP-completo: è uno dei 21 problemi di cui Karp ha dimostrato la NP-completezza nell'articolo del 1972 che ha gettato le basi della teoria delle riduzioni. Esclusa quindi l'esistenza di algoritmi efficienti ed esatti (assumendo che P =/= NP), la ricerca sui problemi di ottimizzazione combinatoria si è spostata su due fronti duali: - Lo sviluppo di algoritmi approssimati}, cioè che trovano una soluzione subottimale al problema, ma dimostrabilmente vicina alla soluzione ottima. - La teoria dell'inapprossimabilità, che studia la massima precisione che possono raggiungere gli algoritmi polinomiali. Qualora si dimostri che un problema non è approssimabile oltre un certo limite teorico (solitamente una funzione dell'ottimo), e che tale limite è raggiunto da un algoritmo di approssimazione, la complessità approssimata del problema è essenzialmente risolta, in quanto sia l'algoritmo che il risultato di inapprossimabilità sono ottimali. In questa tesi descriviamo un algoritmo presentato in un recente lavoro di Trevisan che sfrutta tecniche spettrali per l'approssimazione di MaxCUT. L'obiettivo di questa tesi è fornire una sua analisi sperimentale che dimostra che l'algoritmo è applicabile a problemi di grandi dimensioni. Per poter sviluppare la tesi, richiameremo il concetto di Laplaciano di un grafo e alcuni risultati elementari di teoria spettrale dei grafi. Definiremo i problemi di SparsestCUT e edge expansion, le disuguaglianze alla Cheeger e l'algoritmo di partizionamento spettrale, che sono le prime applicazioni storiche della teoria spettrale e che hanno ispirato l'algoritmo spettrale per MaxCUT. Nel seguito concentreremo l'attenzione su MaxCUT. Descriveremo l'algoritmo di Goemans e Williamson, che in un articolo del 1995 hanno presentato la prima approssimazione non banale, con precisione alpha_GW = 0.878..., cioè che restituisce una soluzione di valore almeno alpha_GW volte l'ottimo. L'algoritmo si basa su un rilassamento geometrico di un problema di programmazione quadratica intera equivalente a MaxCUT. Tale problema è formulabile come programma semidefinito, quindi risolubile in tempo polinomiale. La soluzione del rilassamento viene quindi arrotondata così da ottenere una soluzione ammissibile del programma quadratico intero. Non è stato scoperto alcun algoritmo che garantisca un rapporto di approssimazione al caso pessimo migliore di quello di Goemans-Williamson. Vedremo come la Unique Games Conjecture, una recente congettura di Khot et al., implicherebbe l'ottimalità dell'algoritmo semidefinito. Tale congettura è adoperata per la costruzione di un verificatore PCP (Probabilistically Checkable Proof) riducibile a MaxCUT. L'analisi del verificatore si basa su una congettura recentemente dimostrata: il teorema Majority is Stablest, sulle proprietà estremali delle funzioni monotone {0,1}^n -> R. Altro ingrediente fondamentale è l'analisi di Fourier delle funzioni finite, un campo recentemente in fermento per le sue applicazioni in combinatoria e informatica teorica. Infine presenteremo l'algoritmo spettrale per MaxCUT. A differenza degli approcci tipo Goemans-Williamson, basati su programmazione semidefinita, esso fa uso dell'autovettore principale del Laplaciano per trovare un'immersione del grafo nella retta. Tramite un algoritmo lineare di arrotondamento è quindi possibile estrarre dall'immersione sottografi con buoni tagli, e ripetere ricorsivamente il procedimento. Il rapporto di approssimazione raggiunto è almeno 0.507.... Rispetto all'algoritmo semidefinito, l'algortimo spettrale ha un'analisi elementare ed è facilmente implementabile, visto che l'unico passo complesso consiste nel calcolo di un autovettore di una matrice sparsa quanto il grafo. Sfruttando librerie esistenti per il calcolo numerico di autovettori di matrici sparse, è possibile applicare l'algoritmo a grafi di dimensioni consistenti. Presenteremo quindi un'implementazione e un'analisi sperimentale, che mostra che l'algortimo è molto efficiente in pratica, e che pone nuovi problemi aperti sulla sua analisi teorica

    Cache-Oblivious Peeling of Random Hypergraphs

    Full text link
    The computation of a peeling order in a randomly generated hypergraph is the most time-consuming step in a number of constructions, such as perfect hashing schemes, random rr-SAT solvers, error-correcting codes, and approximate set encodings. While there exists a straightforward linear time algorithm, its poor I/O performance makes it impractical for hypergraphs whose size exceeds the available internal memory. We show how to reduce the computation of a peeling order to a small number of sequential scans and sorts, and analyze its I/O complexity in the cache-oblivious model. The resulting algorithm requires O(sort(n))O(\mathrm{sort}(n)) I/Os and O(nlogn)O(n \log n) time to peel a random hypergraph with nn edges. We experimentally evaluate the performance of our implementation of this algorithm in a real-world scenario by using the construction of minimal perfect hash functions (MPHF) as our test case: our algorithm builds a MPHF of 7.67.6 billion keys in less than 2121 hours on a single machine. The resulting data structure is both more space-efficient and faster than that obtained with the current state-of-the-art MPHF construction for large-scale key sets

    Space-Efficient Data Structures for Collections of Textual Data

    Get PDF
    This thesis focuses on the design of succinct and compressed data structures for collections of string-based data, specifically sequences of semi-structured documents in textual format, sets of strings, and sequences of strings. The study of such collections is motivated by a large number of applications both in theory and practice. For textual semi-structured data, we introduce the concept of semi-index, a succinct construction that speeds up the access to documents encoded with textual semi-structured formats, such as JSON and XML, by storing separately a compact description of their parse trees, hence avoiding the need to re-parse the documents every time they are read. For string dictionaries, we describe a data structure based on a path decomposition of the compacted trie built on the string set. The tree topology is encoded using succinct data structures, while the node labels are compressed using a simple dictionary-based scheme. We also describe a variant of the path-decomposed trie for scored string sets, where each string has a score. This data structure can support efficiently top-k completion queries, that is, given a string p and an integer k, return the k highest scored strings among those prefixed by p. For sequences of strings, we introduce the problem of compressed indexed sequences of strings, that is, representing indexed sequences of strings in nearly-optimal compressed space, both in the static and dynamic settings, while supporting supports random access, searching, and counting operations, both for exact matches and prefix search. We present a new data structure, the Wavelet Trie, that solves the problem by combining a Patricia trie with a wavelet tree. The Wavelet Trie improves on the state-of-the-art compressed data structures for sequences by supporting a dynamic alphabet and prefix queries. Finally, we discuss the issue of the practical implementation of the succinct primitives used throughout the thesis for the experiments. These primitives are implemented as part of a publicly available library, Succinct, using state-of-the-art algorithms along with some improvements

    Exercise Ameliorates Endocrine Pancreas Damage Induced by Chronic Cola Drinking in Rats

    Get PDF
    Purpose: This study evaluates whether the daily practice of an exercise routine might protect from endocrine pancreas damage in cola drinking rats. Methods: Forty-eight Wistar rats were randomly assigned to 4 groups depending on a) beverage consumption ad libitum, water (W) or cola beverage (C), and b) physical activity, sedentary (S) or treadmill running (R). Accordingly, 4 groups were studied: WS (water sedentary), WR (water runner), CS (cola sedentary) and CR (cola runner). Body weight, nutritional data, plasma levels of glucose, creatinine, total cholesterol and cholesterol fractions, and triglycerides (enzymocolorimetry), and systolic blood pressure (plethysmography) were measured. After 6 months, euthanasia was performed (overdose sodium thiopental). Pancreatic tissue was immediately excised and conventionally processed for morphometrical and immunohistochemical determinations. Results: The effects of running and chronic cola drinking on pancreas morphology showed interaction (p<0.001) rather than simple summation. Cola drinking (CS vs WS) reduced median pancreatic islet area (-30%, 1.8 104 μm2 vs 2.58 104 μm2, p<0.0001) and median β-cell mass (-43%, 3.81 mg vs 6.73 mg, p<0.0001), and increased median α/β ratio (+49%, 0.64 vs 0.43, p< 0.001). In water drinking rats (WR vs WS), running reduced median α-cell mass (-48%, 1.48 mg vs 2.82 mg, p<0.001) and α/β ratio (-56%, 0.19 vs 0.43, p<0.0001). Differently, in cola drinking rats (CR vs CS), running partially restored median islet area (+15%, 2.06 104 μm2 vs 1.79 104 μm2, p<0.05), increased median β-cell mass (+47%, 5.59 mg vs 3.81 mg, p <0.0001) and reduced median α/β ratio (-6%, 0.60 vs 0.64, p<0.05). Conclusion: This study is likely the first reporting experimental evidence of the beneficial effect of exercise on pancreatic morphology in cola-drinking rats. Presently, the increase of nearly 50% in β cells mass by running in cola drinking rats is by far the most relevant finding. Moderate running, advisably indicated in cola consumers and patients at risk of diabetes, finds here experimental support.Fil: Otero-Losada, Matilde Estela. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Houssay. Instituto de Investigaciones Cardiológicas. Universidad de Buenos Aires. Facultad de Medicina. Instituto de Investigaciones Cardiológicas; ArgentinaFil: Gonzalez, Julian. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Houssay. Instituto de Investigaciones Cardiológicas. Universidad de Buenos Aires. Facultad de Medicina. Instituto de Investigaciones Cardiológicas; ArgentinaFil: Muller, Maria Angelica. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Houssay. Instituto de Investigaciones Cardiológicas. Universidad de Buenos Aires. Facultad de Medicina. Instituto de Investigaciones Cardiológicas; ArgentinaFil: Ottaviano, Graciela Mabel. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Houssay. Instituto de Investigaciones Cardiológicas. Universidad de Buenos Aires. Facultad de Medicina. Instituto de Investigaciones Cardiológicas; ArgentinaFil: Cao, Gabriel Fernando. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Houssay. Instituto de Investigaciones Cardiológicas. Universidad de Buenos Aires. Facultad de Medicina. Instituto de Investigaciones Cardiológicas; ArgentinaFil: Azzato, Francisco. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Houssay. Instituto de Investigaciones Cardiológicas. Universidad de Buenos Aires. Facultad de Medicina. Instituto de Investigaciones Cardiológicas; ArgentinaFil: Ambrosio, Giuseppe. Università di Perugia; ItaliaFil: Milei, Jose. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Houssay. Instituto de Investigaciones Cardiológicas. Universidad de Buenos Aires. Facultad de Medicina. Instituto de Investigaciones Cardiológicas; Argentin

    Cardiorenal Involvement in Metabolic Syndrome Induced by Cola Drinking in Rats: Proinflammatory Cytokines and Impaired Antioxidative Protection

    Get PDF
    We report experimental evidence confirming renal histopathology, proinflammatory mediators, and oxidative metabolism induced by cola drinking. Male Wistar rats drank ad libitum regular cola (C, = 12) or tap water (W, = 12). Measures. Body weight, nutritional data, plasma glucose, cholesterol fractions, TG, urea, creatinine, coenzyme Q10, SBP, and echocardiograms (0 mo and 6 mo). At 6 months euthanasia was performed. Kidneys were processed for histopathology and immunohistochemistry (semiquantitative). Compared with W, C rats showed (I) overweight (+8%, < 0.05), hyperglycemia (+11%, < 0.05), hypertriglyceridemia (2-fold, < 0.001), higher AIP (2-fold, < 0.01), and lower Q10 level (−55%, < 0.05); (II) increased LV diastolic diameter (+9%, < 0.05) and volume (systolic +24%, < 0.05), posterior wall thinning (−8%, < 0.05), and larger cardiac output (+24%, < 0.05); (III) glomerulosclerosis (+21%, < 0.05), histopathology (+13%, < 0.05), higher tubular expression of IL-6 (7-fold, < 0.001), and TNF (4-fold, < 0.001). (IV) Correlations were found for LV dimensions with IL-6 (74%, < 0.001) and TNF (52%, < 0.001) and fully abolished after TG and Q10 control. Chronic cola drinking induced cardiac remodeling associated with increase in proinflammatory cytokines and renal damage. Hypertriglyceridemia and oxidative stress were key factors. Hypertriglyceridemic lipotoxicity in the context of defective antioxidant/anti-inflammatory protection due to low Q10 level might play a key role in cardiorenal disorder induced by chronic cola drinking in rats.Fil: Otero-Losada, Matilde Estela. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Houssay. Instituto de Investigaciones Cardiológicas. Universidad de Buenos Aires. Facultad de Medicina. Instituto de Investigaciones Cardiológicas; ArgentinaFil: Gómez Llambí de Oromí, Hernán Jorge. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Houssay. Instituto de Investigaciones Cardiológicas. Universidad de Buenos Aires. Facultad de Medicina. Instituto de Investigaciones Cardiológicas; ArgentinaFil: Ottaviano, Graciela Mabel. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Houssay. Instituto de Investigaciones Cardiológicas. Universidad de Buenos Aires. Facultad de Medicina. Instituto de Investigaciones Cardiológicas; ArgentinaFil: Cao, Gabriel Fernando. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Houssay. Instituto de Investigaciones Cardiológicas. Universidad de Buenos Aires. Facultad de Medicina. Instituto de Investigaciones Cardiológicas; ArgentinaFil: Muller, Maria Angelica. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Houssay. Instituto de Investigaciones Cardiológicas. Universidad de Buenos Aires. Facultad de Medicina. Instituto de Investigaciones Cardiológicas; ArgentinaFil: Azzato, Francisco. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Houssay. Instituto de Investigaciones Cardiológicas. Universidad de Buenos Aires. Facultad de Medicina. Instituto de Investigaciones Cardiológicas; ArgentinaFil: Ambrosio, Giuseppe. Università di Perugia; ItaliaFil: Milei, Jose. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Houssay. Instituto de Investigaciones Cardiológicas. Universidad de Buenos Aires. Facultad de Medicina. Instituto de Investigaciones Cardiológicas; Argentin

    IMPlementing split Regimen OVEr Single dose using a Plan-Do-Study-Act approach (IMPROVES study)

    Get PDF
    Background and aims A split-dose regimen for colonoscopy is recommended by international guidelines, but its adoption is still suboptimal. Our aim was to assess whether a Plan-Do-Study-Act approach (PDSA), a scientific method promoting quality improvement, would be able to improve adherence to a split-dose regimen, and to identify factors influencing its adoption. Methods This study consisted of three phases: Cycle 1: a cross-sectional assessment of split-dose adherence in consecutive outpatients/inpatients undergoing colonoscopies in 74 Italian centers; Educational intervention: regional meetings with literature review, analysis of Cycle 1 data, and discussion on corrective measures; local diffusion of educational material and tools for improvement; Cycle 2: reassessment of split-dose adherence after spontaneous local interventions. Demographic, clinical, and procedural variables were systematically collected. Multivariate logistic regression was used to identify predictors of split-dose adoption. Results In total, 8213 patients (mean age = 60.29 years (SD = 13.58), men = 54 %, outpatients = 88.4 %) were enrolled between 2013 and 2016 (Cycle 1 = 4189 patients and Cycle 2 = 4024 patients). Split-dose adoption rose from 29.1 % in Cycle 1 to 51.1 % in Cycle 2 ( P &lt; 0.0001), and being enrolled in Cycle 2 independently predicted split-dose adherence (OR = 2.9; 95 %CI 2.6 - 3.3). The adoption improved in all time slots, including colonoscopies scheduled before 0930. The main corrective measures were: rescheduling of colonoscopies after 0930 (between 0930 and 1130: OR = 2.6; 95 %CI 2.3 - 3.1; after 1130: OR = 7; 95 %CI 5.9 - 8.4); the cleansing regimen communicated by the Endoscopy unit (via form: OR = 1.6; 95 %CI 1.3 - 1.9; via visit: OR = 2.1; 95 %CI 1.7 - 2.5); a decrease in the use of deep sedation (OR = 2; 95 %CI 1.7 - 2.5). Conclusions An educational intervention with observation-driven corrections through a PDSA approach was able to substantially increase the adoption of a split-dose regimen

    Beneficial effect of moderate exercise in kidney of rat after chronic consumption of cola drinks

    Get PDF
    Aim: The purpose of this study was to investigate the effect of moderate intensity exercise on kidney in an animal model of high consumption of cola soft drinks. Methods: Forty-eight Wistar Kyoto rats (age: 16 weeks; weight: 350-400 g) were assigned to the following groups: WR (water runners) drank water and submitted to aerobic exercise; CR (cola runners) drank cola and submitted to aerobic exercise; WS (water sedentary) and CS (cola sedentary), not exercised groups. The aerobic exercise was performed for 5 days per week throughout the study (24 weeks) and the exercise intensity was gradually increased during the first 8 weeks until it reached 20 meters / minute for 30 minutes. Body weight, lipid profile, glycemia, plasma creatinine levels, atherogenic index of plasma (AIP) and systolic blood pressure (SBP) were determined. After 6 months all rats were sacrificed. A kidney histopathological score was obtained using a semiquantitative scale. Glomerular size and glomerulosclerosis were estimated by point-counting. The oxidative stress and proinflammatory status were explored by immunohistochemistry. A one way analysis of variance (ANOVA) with Tukey-Kramer post-hoc test or the Kruskal-Wallis test with Dunn's post-hoc test was used for statistics. A value of p < 0.05 was considered significant. Results: At 6 months, an increased consumption of cola soft drink was shown in CS and CR compared with water consumers (p<0.0001). Chronic cola consumption was associated with increased plasma triglycerides, AIP, heart rate, histopathological score, glomerulosclerosis, oxidative stress and pro-inflammatory status. On the other hand, moderate exercise prevented these findings. No difference was observed in the body weight, SBP, glycemia, cholesterol and plasma creatinine levels across experimental groups. Conclusions: This study warns about the consequences of chronic consumption of cola drinks on lipid metabolism, especially regarding renal health. Additionally, these findings emphasize the protective role of exercise training on renal damage.Fil: Cao, Gabriel Fernando. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Houssay. Instituto de Investigaciones Cardiológicas. Universidad de Buenos Aires. Facultad de Medicina. Instituto de Investigaciones Cardiológicas; ArgentinaFil: Gonzalez, Julian. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Houssay. Instituto de Investigaciones Cardiológicas. Universidad de Buenos Aires. Facultad de Medicina. Instituto de Investigaciones Cardiológicas; ArgentinaFil: Muller, Maria Angelica. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Houssay. Instituto de Investigaciones Cardiológicas. Universidad de Buenos Aires. Facultad de Medicina. Instituto de Investigaciones Cardiológicas; ArgentinaFil: Ottaviano, Graciela Mabel. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Houssay. Instituto de Investigaciones Cardiológicas. Universidad de Buenos Aires. Facultad de Medicina. Instituto de Investigaciones Cardiológicas; ArgentinaFil: Ambrosio, Giuseppe. Università di Perugia; ItaliaFil: Toblli, Jorge Eduardo. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Houssay. Instituto de Investigaciones Cardiológicas. Universidad de Buenos Aires. Facultad de Medicina. Instituto de Investigaciones Cardiológicas; ArgentinaFil: Milei, Jose. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Houssay. Instituto de Investigaciones Cardiológicas. Universidad de Buenos Aires. Facultad de Medicina. Instituto de Investigaciones Cardiológicas; Argentin
    corecore